Search CORE

226 research outputs found

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

Author: Kiros R.
Mikolov T.
Sutskever I.
Vosoughi S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2016
Field of study

We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.Comment: SIGIR 2016, July 17-21, 2016, Pisa. Proceedings of SIGIR 2016. Pisa, Italy (2016

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Distributed Deep Learning for Question Answering

Author: Bottou L.
Chilimbi T.
Dean J.
Sutskever I.
Zhang S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/08/2016
Field of study

This paper is an empirical study of the distributed deep learning for question answering subtasks: answer selection and question classification. Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP, DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results show that the distributed framework based on the message passing interface can accelerate the convergence speed at a sublinear scale. This paper demonstrates the importance of distributed training. For example, with 48 workers, a 24x speedup is achievable for the answer selection task and running time is decreased from 138.2 hours to 5.81 hours, which will increase the productivity significantly.Comment: This paper will appear in the Proceeding of The 25th ACM International Conference on Information and Knowledge Management (CIKM 2016), Indianapolis, US

arXiv.org e-Print Archive

Crossref

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

Author: Graves A.
Kiros R.
Krizhevsky A.
Over P.
Simonyan K.
Srivastava N.
Sutskever I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/08/2015
Field of study

We address the problem of fine-grained action localization from temporally untrimmed web videos. We assume that only weak video-level annotations are available for training. The goal is to use these weak labels to identify temporal segments corresponding to the actions, and learn models that generalize to unconstrained web videos. We find that web images queried by action names serve as well-localized highlights for many actions, but are noisily labeled. To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output. This is achieved by cross-domain transfer between video frames and web images, using pre-trained deep convolutional neural networks. We then use the localized action frames to train action recognition models with long short-term memory networks. We collect a fine-grained sports action data set FGA-240 of more than 130,000 YouTube videos. It has 240 fine-grained actions under 85 sports activities. Convincing results are shown on the FGA-240 data set, as well as the THUMOS 2014 localization data set with untrimmed training videos.Comment: Camera ready version for ACM Multimedia 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion

Author: Bastien F.
Clarke C. LA
El Hihi S.
Li X.
Mikolov T.
Pascanu R.
Shrivastava Anshumali
Sutskever I.
Publication venue
Publication date: 01/01/2015
Field of study

Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a probabilistic suggestion model that is able to account for sequences of previous queries of arbitrary lengths. Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that it outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our model is general enough to be used in a variety of other applications.Comment: To appear in Conference of Information Knowledge and Management (CIKM) 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Energy-based temporal neural networks for imputing missing values

Author: G.E. Hinton
G.E. Hinton
G.W. Taylor
H. Lee
I. Sutskever
J. Besag
J. Domke
J. Ngiam
P. Mirowski
P. Vincent
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2012
Field of study

Imputing missing values in high dimensional time series is a difficult problem. There have been some approaches to the problem [11,8] where neural architectures were trained as probabilistic models of the data. However, we argue that this approach is not optimal. We propose to view temporal neural networks with latent variables as energy-based models and train them for missing value recovery directly. In this paper we introduce two energy-based models. The first model is based on a one dimensional convolution and the second model utilizes a recurrent neural network. We demonstrate how ideas from the energy-based learning framework can be used to train these models to recover missing values. The models are evaluated on a motion capture dataset

Crossref

Ghent University Academic Bibliography

Hyperspectral image classification with convolutional neural networks

Author: He X.
Krizhevsky A.
Lecun Y.
Lee H.
Sutskever I.
Wang T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Crossref

Ghent University Academic Bibliography

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

Improving Variational Autoencoders with Inverse Autoregressive Flow

Author: Chen X.
Josefowicz R.
Kingma D.
Salimans T.
Sutskever I.
Welling M.
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Neural NILM: Deep Neural Networks Applied to Energy Disaggregation

Author: Atlas L. E.
Bastien F.
Graves A.
Hart G. W.
Kolter J. Z.
Krizhevsky A.
Leeb S. B.
Lowe D. G.
Rumelhart D. E.
Sutskever I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/09/2015
Field of study

Energy disaggregation estimates appliance-by-appliance electricity consumption from a single meter that measures the whole home's electricity demand. Recently, deep neural networks have driven remarkable improvements in classification performance in neighbouring machine learning fields such as image classification and automatic speech recognition. In this paper, we adapt three deep neural network architectures to energy disaggregation: 1) a form of recurrent neural network called `long short-term memory' (LSTM); 2) denoising autoencoders; and 3) a network which regresses the start time, end time and average power demand of each appliance activation. We use seven metrics to test the performance of these algorithms on real aggregate power data from five appliances. Tests are performed against a house not seen during training and against houses seen during training. We find that all three neural nets achieve better F1 scores (averaged over all five appliances) than either combinatorial optimisation or factorial hidden Markov models and that our neural net algorithms generalise well to an unseen house.Comment: To appear in ACM BuildSys'15, November 4--5, 2015, Seou

arXiv.org e-Print Archive

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Neural Networks for Information Retrieval

Author: Bahdanau D.
Bordes A.
Goodfellow I.
Hermann K. M.
Hu B.
Kingma D.
Krizhevsky A.
Kusner M. J.
Lin Y.
Lu Z.
Mikolov T.
Robertson S. E.
Srivastava N.
Sutskever I.
Vinyals O.
Weston J.
Publication venue
Publication date: 01/01/2017
Field of study

Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally, it is interesting to see what key insights into IR problems the new technologies are able to give us. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions.Comment: Overview of full-day tutorial at SIGIR 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE